An error-resilient redundant subspace correction method

نویسندگان

  • Tao Cui
  • Jinchao Xu
  • Chen-Song Zhang
چکیده

As we stride toward the exascale era, due to increasing complexity of supercomputers, hard and soft errors are causing more and more problems in high-performance scientific and engineering computation. In order to improve reliability (increase the mean time to failure) of computing systems, a lot of efforts have been devoted to developing techniques to forecast, prevent, and recover from errors at different levels, including architecture, application, and algorithm. In this paper, we focus on algorithmic error resilient iterative linear solvers and introduce a redundant subspace correction method. Using a general framework of redundant subspace corrections, we construct iterative methods, which have the following properties: (1) Maintain convergence when error occurs assuming it is detectable; (2) Introduce low computational overhead when no error occurs; (3) Require only small amount of local (point-to-point) communication compared to traditional methods and maintain good load balance; (4) Improve the mean time to failure. With the proposed method, we can improve reliability of many scientific and engineering applications. Preliminary numerical experiments demonstrate the efficiency and effectiveness of the new subspace correction method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An approach to fault detection and correction in design of systems using of Turbo ‎codes‎

We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...

متن کامل

On construction of resilient functions

An (n, m, t) resilient function is a function f: f0,1g n ?!f0,1g m such that every possible output m-tuple is equally likely to occur when the values of t arbitrary inputs are xed by an opponent and the remaining n ? t input bits are chosen independently at random. The existence of resilient functions has been largely studied in terms of lower and upper bounds. The construction of such function...

متن کامل

Error Performance Analysis of Maximum Rank Distance Codes

In this paper, we first introduce the concept of elementary linear subspace, which has similar properties to those of a set of coordinates. We then use elementary linear subspaces to derive properties of maximum rank distance (MRD) codes that parallel those of maximum distance separable (MDS) codes. Using these properties, we show that, for MRD codes with error correction capability t, the deco...

متن کامل

Error Resilient Image Transmission Over a Bluetooth Network

This paper proposes the use of an Error-Resilient Entropy Coding (EREC) scheme for the transmission of JPEG images across a Bluetooth wireless network. Bluetooth, being an ad-hoc networking standard, can be used in multiple environments, often under difficult conditions, leading to limited bandwidth and significant Bit Error Rates (BER). Compression and transmission provisions, that avoid catas...

متن کامل

Systematic Lossy Error Protection of Video Signals a Dissertation Submitted to the Department of Electrical Engineering and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

This thesis addresses the problem of error-resilient video transmission. In most video transmission applications, a video signal is compressed, packetized and transmitted over an error-prone channel. Owing to multipath fading on wireless channels and/or congestion in the Internet, some video packets are lost or arrive in error. When feedback is unavailable or limited, this problem is traditiona...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computat. and Visualiz. in Science

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2017